Skip to content

test: strengthen Coord/Spec/Coord ping-pong handoff regression test#5825

Closed
Copilot wants to merge 15 commits into
mainfrom
copilot/create-test-for-handoff-issue
Closed

test: strengthen Coord/Spec/Coord ping-pong handoff regression test#5825
Copilot wants to merge 15 commits into
mainfrom
copilot/create-test-for-handoff-issue

Conversation

Copilot AI commented May 13, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

MessageMerger, the internal component that folds streaming AgentResponseUpdate items into a final AgentResponse, had an implicit contract with no tests validating its ordering and grouping behavior. This created two issues:

  1. Message ordering bug: When updates lacked CreatedAt timestamps, CompareByDateTimeOffset treated null timestamps as "greater than" any value, pushing untimestamped messages unpredictably to the end rather than preserving their arrival order. In multi-agent scenarios (handoff, group chat), this caused message reordering that broke conversation coherence.

  2. Missing invariant documentation: The merger's guarantees were never written down, and the code contained dead state (createdTimes HashSet) suggesting abandoned functionality. Future refactors risked silently breaking the contract.

Description

This PR fixes the message ordering issue, documents the merger invariants in ADR 0026, and adds comprehensive tests to pin the expected behavior.

Bug fix in MessageMerger.CompareByDateTimeOffset:

ADR 0026 establishes three invariants:

  1. Single ResponseId per turn — Hosting executors must assign a ResponseId if the agent doesn't provide one; updates with ResponseId == null are "dangling" and appended at the end
  2. Output order preservation — When updates lack CreatedAt, their relative order in the merged output matches arrival order
  3. Per-ResponseId grouping — Messages from each ResponseId appear as a contiguous block (no interleaving), enabling per-agent grouping in multi-agent scenarios

Cleanup:

  • Removed unused createdTimes HashSet that was populated but never consumed

Test coverage added in MessageMergerTests:

  • Insertion-order preservation with no timestamps
  • Insertion-order preservation with mixed timestamps
  • Determinism across repeated runs with mixed timestamps
  • Per-ResponseId grouping for interleaved multi-agent streams
  • Per-ResponseId grouping with distinct response IDs
  • Function call/result ordering preservation
  • FinishReason propagation

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No — this fix preserves intended behavior while correcting a subtle ordering bug

Copilot AI requested review from Copilot and lokitoth and removed request for Copilot May 13, 2026 19:49
@moonbox3 moonbox3 added documentation Improvements or additions to documentation .NET workflows Related to Workflows in agent-framework labels May 13, 2026
@github-actions github-actions Bot changed the title Validate MessageMerger ordering invariants .NET: Validate MessageMerger ordering invariants May 13, 2026
@lokitoth lokitoth moved this to In Progress in Agent Framework May 14, 2026
@lokitoth lokitoth marked this pull request as ready for review May 14, 2026 02:03
@lokitoth lokitoth marked this pull request as draft May 14, 2026 04:49
HandoffAgentExecutor synthesizes a 'Transferred.' tool-result update for
each handoff function call. That update was created without setting
ResponseId, so MessageMerger routed it to the global dangling bucket and
flushed all such tool results at the very end of the merged
AgentResponse, breaking per-step grouping for multi-step handoffs (see
#4544). Streaming output already preserved order because updates are
yielded directly without going through the merger.

Fix: stamp the synthesized update with the same ResponseId as the
preceding agent stream updates so it groups with the agent's other
messages in MessageMerger.

Also adds a regression test that drives the real workflow through
WorkflowHostAgent.RunAsync over a 3-agent handoff chain and asserts the
per-step message ordering of the merged response.
Copilot AI changed the title .NET: Validate MessageMerger ordering invariants Fix multi-step handoff message ordering in non-streaming RunAsync (#4544) May 28, 2026
Copilot AI changed the title Fix multi-step handoff message ordering in non-streaming RunAsync (#4544) Fix multi-step handoff message ordering in non-streaming RunAsync May 28, 2026
Copilot AI changed the title Fix multi-step handoff message ordering in non-streaming RunAsync Investigate HandoffToolMismatchRepro against AI Project chat client May 28, 2026
Copilot AI changed the title Investigate HandoffToolMismatchRepro against AI Project chat client test: strengthen Coord/Spec/Coord ping-pong handoff regression test May 28, 2026
@lokitoth lokitoth marked this pull request as ready for review May 28, 2026 12:45
@lokitoth lokitoth marked this pull request as draft May 28, 2026 12:45
@lokitoth

Copy link
Copy Markdown
Contributor

Closing, test transferred in #6140

@lokitoth lokitoth closed this May 29, 2026
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Agent Framework May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation .NET workflows Related to Workflows in agent-framework

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

.NET: Workflow Host-as-Agent can improperly reorder messages in history and when running non-streaming

4 participants